24 research outputs found
Human body analysis using depth data
Human body analysis is one of the broadest areas within the computer vision field. Researchers have put a strong effort in the human body analysis area, specially over the last decade, due to the technological improvements in both video cameras and processing power. Human body analysis covers topics such as person detection and segmentation, human motion tracking or action and behavior recognition. Even if human beings perform all these tasks naturally, they build-up a challenging problem from a computer vision point of view. Adverse situations such as viewing perspective, clutter and occlusions, lighting conditions or variability of behavior amongst persons may turn human body analysis into an arduous task.
In the computer vision field, the evolution of research works is usually tightly related to the technological progress of camera sensors and computer processing power. Traditional human body analysis methods are based on color cameras. Thus, the information is extracted from the raw color data, strongly limiting the proposals. An interesting quality leap was achieved by introducing the multiview concept. That is to say, having multiple color cameras recording a single scene at the same time. With multiview approaches, 3D information is available by means of stereo matching algorithms. The fact of having 3D information is a key aspect in human motion analysis, since the human body moves in a three-dimensional space. Thus, problems such as occlusion and clutter may be overcome with 3D information.
The appearance of commercial depth cameras has supposed a second leap in the human body analysis field. While traditional multiview approaches required a cumbersome and expensive setup, as well as a fine camera calibration; novel depth cameras directly provide 3D information with a single camera sensor. Furthermore, depth cameras may be rapidly installed in a wide range of situations, enlarging the range of applications with respect to multiview approaches. Moreover, since depth cameras are based on infra-red light, they do not suffer from illumination variations.
In this thesis, we focus on the study of depth data applied to the human body analysis problem. We propose novel ways of describing depth data through specific descriptors, so that they emphasize helpful characteristics of the scene for further body analysis. These descriptors exploit the special 3D structure of depth data to outperform generalist 3D descriptors or color based ones. We also study the problem of person detection, proposing a highly robust and fast method to detect heads. Such method is extended to a hand tracker, which is used throughout the thesis as a helpful tool to enable further research. In the remainder of this dissertation, we focus on the hand analysis problem as a subarea of human body analysis. Given the recent appearance of depth cameras, there is a lack of public datasets. We contribute with a dataset for hand gesture recognition and fingertip localization using depth data. This dataset acts as a starting point of two proposals for hand gesture recognition and fingertip localization based on classification techniques. In these methods, we also exploit the above mentioned descriptor proposals to finely adapt to the nature of depth data.%, and enhance the results in front of traditional color-based methods.L’anà lisi del cos humà és una de les à rees més à mplies del camp de la visió per computador. Els investigadors han posat un gran esforç en el camp de l’anà lisi del cos humà , sobretot durant la darrera dècada, degut als grans avenços tecnològics, tant pel que fa a les cà meres com a la potencia de cà lcul. L’anà lisi del cos humà engloba varis temes com la detecció i segmentació de persones, el seguiment del moviment del cos, o el reconeixement d'accions. Tot i que els essers humans duen a terme aquestes tasques d'una manera natural, es converteixen en un difÃcil problema quan s'ataca des de l’òptica de la visió per computador. Situacions adverses, com poden ser la perspectiva del punt de vista, les oclusions, les condicions d’il•luminació o la variabilitat de comportament entre persones, converteixen l’anà lisi del cos humà en una tasca complicada.
En el camp de la visió per computador, l’evolució de la recerca va sovint lligada al progrés tecnològic, tant dels sensors com de la potencia de cà lcul dels ordinadors. Els mètodes tradicionals d’anà lisi del cos humà estan basats en cà meres de color. Això limita molt els enfocaments, ja que la informació disponible prové únicament de les dades de color.
El concepte multivista va suposar salt de qualitat important. En els enfocaments multivista es tenen múltiples cà meres gravant una mateixa escena simultà niament, permetent utilitzar informació 3D grà cies a algorismes de combinació estèreo. El fet de disposar d’informació 3D es un punt clau, ja que el cos humà es mou en un espai tri-dimensional.
Això doncs, problemes com les oclusions es poden apaivagar si es disposa de informació 3D. L’aparició de les cà meres de profunditat comercials ha suposat un segon salt en el camp de l’anà lisi del cos humà . Mentre els mètodes multivista tradicionals requereixen un muntatge pesat i car, i una celebració precisa de totes les cà meres; les noves cà meres de profunditat ofereixen informació 3D de forma directa amb un sol sensor. Aquestes cà meres es poden instal•lar rà pidament en una gran varietat d'entorns, ampliant enormement l'espectre d'aplicacions, que era molt reduït amb enfocaments multivista. A més a més, com que les cà meres de profunditat estan basades en llum infraroja, no pateixen problemes relacionats amb canvis d’il•luminació.
En aquesta tesi, ens centrem en l'estudi de la informació que ofereixen les cà meres de
profunditat, i la seva aplicació al problema d’anà lisi del cos humà . Proposem noves
vies per descriure les dades de profunditat mitjançant descriptors especÃfics, capaços
d'emfatitzar caracterÃstiques de l'escena que seran útils de cara a una posterior anà lisi
del cos humà . Aquests descriptors exploten l'estructura 3D de les dades de profunditat
per superar descriptors 3D generalistes o basats en color. També estudiem el problema de detecció de persones, proposant un mètode per detectar caps robust i rà pid.
Ampliem aquest mètode per obtenir un algorisme de seguiment de mans que ha estat utilitzat al llarg de la tesi. En la part final del document, ens centrem en l’anà lisi de les mans com a subà rea de l’anà lisi del cos humà . Degut a la recent aparició de les cà meres de profunditat, hi ha una manca de bases de dades públiques.
Contribuïm amb una base de dades pensada per la localització de dits i el reconeixement de gestos utilitzant dades de profunditat. Aquesta base de dades és el punt de partida de dues contribucions sobre localització de dits i reconeixement de gestos basades en tècniques de classificació. En aquests mètodes, també explotem les ja mencionades propostes de descriptors per millor adaptar-nos a la naturalesa de les dades de profunditat
Designing Data: Proactive Data Collection and Iteration for Machine Learning
Lack of diversity in data collection has caused significant failures in
machine learning (ML) applications. While ML developers perform post-collection
interventions, these are time intensive and rarely comprehensive. Thus, new
methods to track and manage data collection, iteration, and model training are
necessary for evaluating whether datasets reflect real world variability. We
present designing data, an iterative, bias mitigating approach to data
collection connecting HCI concepts with ML techniques. Our process includes (1)
Pre-Collection Planning, to reflexively prompt and document expected data
distributions; (2) Collection Monitoring, to systematically encourage sampling
diversity; and (3) Data Familiarity, to identify samples that are unfamiliar to
a model through Out-of-Distribution (OOD) methods. We instantiate designing
data through our own data collection and applied ML case study. We find models
trained on "designed" datasets generalize better across intersectional groups
than those trained on similarly sized but less targeted datasets, and that data
familiarity is effective for debugging datasets
Collaborative voting of 3D features for robust gesture estimation
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Human body analysis raises special interest because it enables a wide range of interactive applications. In this paper we present a gesture estimator that discriminates body poses in depth images. A novel collaborative method is proposed to learn 3D features of the human body and, later, to estimate specific gestures. The collaborative estimation framework is inspired by decision forests, where each selected point (anchor point) contributes to the estimation by casting votes. The main idea is to detect a body part by accumulating the inference of other trained body parts. The collaborative voting encodes the global context of human pose, while 3D features represent local appearance. Body parts contributing to the detection are interpreted as a voting process. Experimental results for different 3D features prove the validity of the proposed algorithm.Peer ReviewedPostprint (author's final draft
Radio amateur information sheet Radio amateurs' examination and novice radio amateur examination
SIGLEAvailable from British Library Document Supply Centre- DSC:7234.498(RA--184) / BLDSC - British Library Document Supply CentreGBUnited Kingdo
FascinatE Newsletter 1
This FascinatE newsletter explains how gesture recognition will be used in the FascinatE system, how our first test shoot went at a Premier League football match, and explains about up and coming events.Postprint (published version
FascinatE Newsletter 1
This FascinatE newsletter explains how gesture recognition will be used in the FascinatE system, how our first test shoot went at a Premier League football match, and explains about up and coming events
Real-time head and hand tracking based on 2.5D data
A novel real-time algorithm for head and hand tracking is proposed in this paper. This approach is based on 2.5D data from a range camera, which is exploited to resolve ambiguities and overlaps. Experimental results show high robustness against partial occlusions and fast movements. The estimated positions are fairly stable, allowing the extraction of accurate trajectories which may be used for gesture classification purposes.Peer Reviewe
Real-time head and hand tracking based on 2.5D data
A novel real-time algorithm for head and hand
tracking is proposed in this paper. This approach is based on data from a range camera, which is exploited to resolve ambiguities and overlaps. The position of the head is estimated with a depth-based
template matching, its robustness being reinforced with an adaptive search zone. Hands are detected in a bounding box attached
to the head estimate, so that the user may move freely in the scene. A simple method to decide whether the hands are open or
closed is also included in the proposal. Experimental results show high robustness against partial occlusions and fast movements. Accurate hand trajectories may be extracted from the estimated hand positions, and may be used for interactive applications as well as for gesture classification purposes.Peer Reviewe
Oriented radial distribution on depth data: application to the detection of end-effectors
End-effectors are considered to be the main topological extremities
of a given 3D body. Even if the nature of such body is not restricted,
this paper focuses on the human body case. Detection of human
extremities is a key issue in the human motion capture domain, being
needed to initialize and update the tracker. Therefore, the effectiveness
of human motion capture systems usually depends on the
reliability of the obtained end-effectors. The increasing accuracy,
low cost and easy installation of depth cameras has opened the door
to new strategies to overcome the body pose estimation problem.
With the objective of detecting the head, hands and feet of a human
body, we propose a new local feature computed from depth data,
which gives an idea of its curvature and prominence. Such feature is
weighted depending on recent detections, providing also a temporal
dimension. Based on this feature, some end-effector candidate blobs
are obtained and classified into head, hands and feet according to
three probabilistic descriptors.Peer Reviewe
Oriented radial distribution on depth data: Application to the detection of end-effectors
End-effectors are considered to be the main topological extremities
of a given 3D body. Even if the nature of such body is not restricted,
this paper focuses on the human body case. Detection of human
extremities is a key issue in the human motion capture domain, being
needed to initialize and update the tracker. Therefore, the effectiveness
of human motion capture systems usually depends on the
reliability of the obtained end-effectors. The increasing accuracy,
low cost and easy installation of depth cameras has opened the door
to new strategies to overcome the body pose estimation problem.
With the objective of detecting the head, hands and feet of a human
body, we propose a new local feature computed from depth data,
which gives an idea of its curvature and prominence. Such feature is
weighted depending on recent detections, providing also a temporal
dimension. Based on this feature, some end-effector candidate blobs
are obtained and classified into head, hands and feet according to
three probabilistic descriptors.Peer ReviewedPreprin